Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docsp-31169 - change stream schema inference #164

Merged
merged 6 commits into from
Jul 13, 2023

Conversation

mongoKart
Copy link
Contributor

Copy link
Contributor

@rustagir rustagir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few things

@@ -42,6 +42,18 @@ Overview

.. include:: /scala/filters.txt

.. important:: Change Stream Schema Inference

When the {+driver-short+} infers the schema of a data frame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: should we rename this source constant since the product is not a driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it

Comment on lines 333 to 339
| When set to ``true``:

- The connector filters out messages that
omit the ``fullDocument`` field and only publishes the value of the
field.
- If you don't specify a schema, the connector infers the schema
from the change stream document rather than from the underlying collection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: per the style guide re: lists, do not start a list with a fragment

Suggested change
| When set to ``true``:
- The connector filters out messages that
omit the ``fullDocument`` field and only publishes the value of the
field.
- If you don't specify a schema, the connector infers the schema
from the change stream document rather than from the underlying collection.
| The connector displays the following behavior when this property is set to ``true``:
- The connector filters out messages that
omit the ``fullDocument`` field and only publishes the value of the
field.
- If you don't specify a schema, the connector infers the schema
from the change stream document rather than from the underlying collection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch

Comment on lines 48 to 49
read from a change stream, by default,
it will use the schema of the underlying collection rather than the schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S: Remove the "by default" here. It is implicit as the next sentence explains how to change the behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, but to me it benefits from some indication that this behavior can be changed (apart from the following sentence, I mean). Especially when this isn't the same page where this option is documented, so the reader might not be in the mindset of settings they can turn on or off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me!

Comment on lines 47 to 50
When the {+driver-short+} infers the schema of a data frame
read from a change stream, by default,
it will use the schema of the underlying collection rather than the schema
of the change stream. To instruct the connector to use the schema of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the {+driver-short+} infers the schema of a data frame
read from a change stream, by default,
it will use the schema of the underlying collection rather than the schema
of the change stream. To instruct the connector to use the schema of the
When the {+driver-short+} infers the schema of a data frame
read from a change stream, by default,
it will use the schema of the underlying collection rather than of the change stream.
To instruct the connector to use the schema of the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used your suggestion plus "that" to hopefully make it more clear. LMK

Comment on lines 50 to 52
of the change stream. To instruct the connector to use the schema of the
change stream, set the ``change.stream.publish.full.document.only`` option
to ``true``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
of the change stream. To instruct the connector to use the schema of the
change stream, set the ``change.stream.publish.full.document.only`` option
to ``true``.
of the change stream. The connector infers the schema from the
change stream if you set the ``change.stream.publish.full.document.only`` option
to ``true``.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded. just for my info, what don't you like about the first version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing of "instruct the connector" seemed needlessly complicated. Better to just have it be a declarative sentence, IMO. but it just a suggestion :)

Comment on lines 54 to 55
For more information on configuring a read operation, see the
:ref:`spark-read-conf` guide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what i had initially. not sure why i changed

Copy link
Contributor

@rustagir rustagir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

caitlindavey and others added 2 commits July 13, 2023 10:45
* Spark Connector Minor versions 10.2

* Delete settings.json

Deleting VS code settings
@mongoKart mongoKart merged commit 099521a into mongodb:master Jul 13, 2023
1 check passed
mongoKart added a commit that referenced this pull request Jul 13, 2023
Co-authored-by: Caitlin Davey <caitlin@caitlindavey.com>
(cherry picked from commit 099521a)
@mongoKart mongoKart deleted the docsp-31169-infer-schema branch July 13, 2023 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants